Automatic tone-following method and system for music accompanying devices

ABSTRACT

The present invention provides an automatic tone-following method and system for music accompanying devices, which detects the frequency of the singer&#39;s voice instantly and continuously and compares it with the theme tone frequency of the accompanying music to estimate the error between the tones of the singer and the music so as to adjust the tone of the music to match the tone of the singer&#39;s voice. The present invention calculates the fundamental frequency of the user&#39;s voice every short section of time through a tone estimator, then converts the fundamental frequency of the user&#39;s voice into user scale sequence in a scale sequence recorder, then compares the difference between the user scale sequence and the theme scale sequence through a scale matcher. Whether a transposition is needed through a transposition judger is determined so that the scale parameter in the music synthesizer can be adjusted.

BACKGROUND OF INVENTION

1. Field of the Invention

The present invention relates generally to an automatic tone-followingmethod for music accompanying devices, as well as an innovative designof a tone-following system.

2. Description of Related Art

For general people, when singing a song along with the accompanyingmusic (for example, using a karaoke machine), the pitch gets lost easilydue to the over-high or over-low tone of the accompanying music and thetone of the singer cannot catch up with the tone of the accompanyingmusic. As a result, there will be a disharmony between the rhythm of thesong and that of the music, which greatly affects the effect of singing.

In view of the aforementioned problem, related manufacturers havedeveloped an apparatus for the music accompanying device to change thetone of the accompanying music according to the tone of the singer.However, the technology adopted is to measure the tone of the singer ina preset time cycle, and obtain an “average tone” within this time cyclethrough calculation. Then, the “average tone” is compared to thereference tone of a matching accompanying music to provide a disharmonysignal, and accordingly change the tone of the accompanying music.However, in such prior-art automatic tone-following method foraccompanying music, the calculation of the tone of the singer is toobtain an average value (i.e. average tone) within a time cycle.Therefore, each time interval (e.g. 5 sec) for which an average value isobtained has already caused an obvious delay in comparison with thesinging. Moreover, the time needed for calculation and comparison willmake the delay more obvious. Hence, in actual application, in suchprior-art automatic tone-following method for accompanying music, theprocess to change the tone of the accompanying music to meet the tone ofthe singer cannot achieve a good instantaneity. Change of the tone ofthe accompanying music will often occur after the singer has completedone sentence of the lyric and is going on to the next sentence.

Also, as the method disclosed above is to compare the values between twofixed points, it is difficult to obtain an accurate transposition value.Therefore it cannot meet the expectation of the user, and has a room tobe improved.

Thus, to overcome the aforementioned problems of the prior art, it wouldbe an advancement if the art to provide an improved structure that cansignificantly improve the efficacy.

Therefore, the inventor has provided the present invention ofpracticability after deliberate design and evaluation based on years ofexperience in the production, development and design of relatedproducts.

SUMMARY OF THE INVENTION

-   1. The automatic tone-following method for music accompanying    devices disclosed in the present invention does not calculate the    user's voice tone to obtain an average value, but calculate it every    a section of time (e.g. 0.1), and uses the scale sequence recorder    12 to convert the fundamental frequency of the user's voice into    user scale sequence 121; That is to say, the present invention    compares theme scale sequence 14 and user voice scale sequence 121,    instead of comparing its average tone. The scale matcher 13 compares    the matching degree of a section of scale sequence. This is a mode    of dynamic comparison of the scale curve before outputting the scale    difference upon optimum matching, because the scale matcher 13    dynamically compares the scale sequence curve in a certain period of    time, instead of comparing the average value of tone in a certain    period of time. Hence, the transposition value obtained has a higher    accuracy, and an optimum tone adjustment effect can be obtained to    better meet user's need.-   2. The technical features of the present invention are as follow:    direct acquisition of the theme of the recorded song; no need for    complicated calculation processes; low system computation load; low    occupation of system resources; and consequently higher operational    efficiency and instantaneity. Hence, the present invention has    achieved a practical advancement by considerably improving the    problem of delay in prior-art systems. Although the invention has    been explained in relation to its preferred embodiment, it is to be    understood that many other possible modifications and variations can    be made without departing from the spirit and scope of the invention    as hereinafter claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a systematic block diagram of the automatic tone-followingmethod for music accompanying devices of the present invention.

FIG. 2 is a block diagram of the action process of the scale matcher ofthe present invention.

DETAILED DESCRIPTION OF THE INVENTION

FIGS. 1 and 2 depict a preferred embodiment of the automatictone-following method for music accompanying devices according to thepresent invention. While such an embodiment is for description purposeonly, its structure shall not limit the range of patent application;said automatic tone-following method is described as below:

As shown in FIG. 1, for each small section of time (about 0.1 sec), thefundamental frequency is calculated through a tone estimator 11. Thetone estimator 11 calculates the fundamental cycle or frequency of thissection of sound, which can be obtained through an autocorrelationfunction calculating the maximum value, or through the relative positionor distance of the peak value. The relation between the cycle andfrequency is:

Fundamental frequency=sampling frequency/fundamental cycle

The sampling frequency is the number of sound points sampled in eachsecond. Then, in a scale sequence recorder 12, a succession of thefundamental frequencies of the input sound of the user is converted intoa user scale sequence 121, which is then recorded. The relation betweenscale and frequency is as below:

When the scale is A4, the frequency is 440 Hz. When the scale isincreased by a semitone, the frequency is increased by

, and likewise, when the scale is decreased by a semitone, the frequencyis decreased by

. Therefore, when the scale is increased by 12 steps, the frequency willincrease by twice. Then, through a scale matcher 13, the user scalesequence 121 is calculated and compared to the theme scale sequence 14to obtain the difference. Here, the theme scale sequence 14 is inadvance stored in the music text 15, for example: in a midi (musicalinstrument digital interface) file, such information of the music textcan be stored at the same time. The scale matcher 13 uses a method ofDynamic Time Warping (or DTW) correction to compare the differencebetween the user scale sequence and theme scale sequence 14, as detailedbelow:

Assume user scale sequence 121 is n1, n2, . . . , nj, each representingthe scale (tone) of the user (singer) calculated for each section oftime (e.g. 0.1 sec), and assume theme scale sequence 14 is m1, m2, . . ., mj, each representing the theme scale in each section of time (e.g.0.1 sec). Here, the scales are represented as numbers 1˜255, forexample, scale C3 is represented as 60, scale D3 is represented as 61,scale B3 is represented as 59, and so forth. Because during the singing,the position of the beat point of the singer's voice may not be the sameas the background, a dynamic time correction shall be made during thecomparison of time scale, so as to generate correct comparison results,as shown in the following figure:

In the embodiment disclosed above, from the angle of time, the n2, n3(i.e. user scale sequence) will be corrected according to m2 (i.e. themescale sequence), so that the beat point positions of the backgroundmusic can be compared with the beat point positions of the singer'svoice at correct and corresponding beat point positions; duringtransposition, the theme scale sequence transposes along with the userscale sequence.

Assume dist (ni,mk) represents the error between scale ni and mk,acu_dist (ni,mk) represents the accumulated error from the past optimumpath to scale ni,mk, then the minimum accumulated error of each nodematched in the above figure is:

acu_dist(ni,mk)=dist(ni,mk)+min{acu_dist(ni−1,mk), acu_dist(ni,mk−1),acu_dist(ni−1,mk−1), . . . }

wherein, min{. . . } represents the minimum value, the range in {. . . }is decided in an empirical method. Generally, a range between −2˜+2 isselected for the time correction value, therefore the error of lastmatching result is acu_dist(nj,mj), j is the last time point in thiscomparison, its value is decided by experiment and is usually higherthan 40 (4 sec) and lower than 100 (10 sec). Optimum path refers to thepath with minimum accumulated error. In practice, it does not need to becalculated.

Based on the above method, we can calculate how much transposition isneeded for the theme. As shown in FIG. 2, firstly set theme scaletransposition value s=K1, s=1 means the scale is increased by a half,s=−1 means the scale is decreased by a half. Then, use the above method(i.e. the aforementioned Dynamic Time Warping (DTW) correction method)to compare the user scale sequence and the theme scale sequence aftertransposition and record the accumulated error=Dis(s) of the lastmatching result. Then, assume s=s+1, and calculate Dis(s) again tills=K2, and finally find the transposition value s=s_(min), with Dis(s_(min)) as the minimum value, where K1<=s<=K2. Usually, assume K1=−6,K2=6.

Then, a transposition judger 16 is used to decide if and when thetransposition is needed. The transposition judger 16 processestransposition when the error Dis(s_(min)) is lower than a constantempirical value D. In processing the transposition, the theme note istransposed by s semitones. To make the music harmonious and natural,adjustments are made at set intervals, and usually when the theme noteis long.

The music synthesizer 17 synthesizes digitally recorded music text 15into actual music waves, which, together with the user's voice, areoutput by a mixer 18. When transposition is needed, the scale parameterin the music synthesizer 17 is adjusted. In practice, all the notes inthe music text 15 are increased or decreased by several scales. Thenumber of scales here is usually smaller than or equal to 6 semitones.But there is no limit, because 12 semitones (8 degrees tone) mean adifference of frequency by two times. In tone sense, the frequencydifference by two times sounds the same. When it is higher than 6semitones, falling tone can be used; when it is lower than 6 semitones,rising tone can be used.

Below is an example of practice:

When playing the background music, start recording, and set the soundformat as monotone 16 bits, sampling frequency as 44100 Hz, and thelength of each recording as 0.1 sec. In the next step, use the toneestimator 11 to calculate the fundamental frequency of the singer'svoice. The method is as follows: Assume the sound recorded is:

x(n), n=0, 1, 2, . . . , N−1, N=4410, then

1. Calculate the autocorrelation function rx(k), wherein:

r _(x)(k)=Σ_(n) x(n)x(n−k), n=0, 1, 2, . . . , N−1, k=22, 23, 24, . . ., 674

The range of value k represents the frequency range to be detected:

44100/22˜44100/674=2004.54˜65.43 Hz

2. Find k_(max)=arg(max(r_(x)(k))|_(k)), k_(max) represents the value ofk when r_(x)(k) has a maximum value.3. Fundamental frequency ƒ₀=44100/k_(max). Then, convert the fundamentalfrequency into a scale code. Assume fundamental frequency=440 Hz, thenconvert it into scale A4 (tone La), scale code is 69. A difference ofone semitone mean a difference of frequency by

times, and a difference of scale code by 1. The scale sequence recorder12 will record the theme scale code in the theme scale sequence 14. Inthe scale matcher 13, firstly set K1=−6,K2=6, then set scale codesequence length as 4 sec (j=40). Calculate once every 0.1 seconds ofrecording. So there are 40 calculations in 4 seconds. Assume the themescale sequence 14 recorded is mi, i=0, 1, 2, . . . , 39, user's voicescale sequence is ni, i=0, 1, 2, . . . , 39, transposition is s, andassume the difference of scale code mi, nk is dist (mi, nk), set dist(mi, nk)>=0, and set mi, nk different by an 8 degrees tone (12semitones), the resulting errors of calculation will be equal, i.e.:

dist (mi, nk)=dist (mi+12*N, nk);

wherein N is an integer, and set time correction value as −1˜+0, and thescale matcher 13 will act as follow;

-   -   1. Set s=K1    -   2. set i=1, and set the initial value of accumulated error value        sequence acu_dist[0˜39][0˜39] as a very large number 1000000    -   3. Calculate acu_dist [0][0]=dist (m0+s, n0)    -   4. Set j=i−1    -   5. If j>=40, skip to Step 8    -   6. acu_dist [i][j]=min{ acu_dist [i−1][j−1], acu_dist [i−1][j],        acu_dist [i][j−1]}+dist (mi+s, nj)    -   7. j=j+1 If j<=i+1, back to Step 5    -   8. i=i+1 If i<40, back to Step 4    -   9. Dis(s)=dtw_dist[39][39]    -   10. s=s+1    -   11. If s<=K2, back to Step 2    -   12. End.

Then, In the transposition judger 16, if Dis(s_(min))<=40 (40 is anempirical value), and the length of theme note under play>=1 sec, thentranspose the theme note by s_(min) semitones, and carry out the nexttransposition after an interval of more than 4 sec (4 sec is anempirical value); At last, the music synthesizer 17 synthesizesdigitally recorded music text into actual music waves, which are thenoutput together with the user's voice by the mixer 18 and speaker 19.

1. An automatic tone-following method for music accompanying devices,the method comprising the steps of: providing a tone estimator tocalculate the fundamental frequency of the user's voice at setintervals; converting fundamental frequency of the user's voice in ascale sequence recorder into user scale sequence, which is thenrecorded; comparing a difference between the user scale sequence and thetheme scale sequence in preset music text through a scale matcher; thescale matcher comparing the difference between the user scale sequenceand theme scale sequence through a method of dynamic time warpingcorrection; deciding on if and when a transposition is needed for theaccompanying music through a transposition judger; if a transposition isneeded, the scale parameter in a music synthesizer being automaticallyadjusted; synthesizing digitally recorded music text into actual musicwaves by a music synthesizer, the waves being outputted together withthe user's voice by the mixer and the speaker.
 2. The method defined inclaim 1, wherein the music synthesizer adjusts the scale parameter byincrease or decrease by several scales of all the note scales in themusic text.
 3. The method defined in claim 2, wherein the number ofscales must be smaller than or equal to 6 semitones.
 4. The methoddefined in claim 1, wherein the theme scale sequence is recorded inadvance in the music text.
 5. An automatic tone-following system formusic accompanying devices, comprising: a tone estimator means tocalculate the fundamental frequency of the user's voice at setintervals; a scale sequence recorder means to convert the fundamentalfrequency of the user's voice into user scale sequence and to record thefundamental frequency; a scale matcher means to compare the differencebetween the user scale sequence and the theme scale sequence in presetmusic text, the scale matcher comparing the difference between the userscale sequence and theme scale sequence through a method of dynamic timewarping correction; a transposition judger means to judge if and when atransposition is needed for the accompanying music; a music synthesizerto automatically adjust the scale parameter in the music synthesizerwhen the transposition judger decides a transposition is needed; themusic synthesizer synthesizes digitally recorded music text into actualmusic waves, which are then outputted together with the user's voice bya preset mixer.